File Formats
File formats define how information is encoded a digital file. File formats can be standardised, open, well documented and possibly associated with a reference implementation for how software should interact with files of that format. But file formats are not always as clearly defined, and format specifications are not always closely followed by the software that implements them. Understanding file formats and how we interact with them in practice can be therefore be critical to ensuring effective digital preservation. This page provides some guidance on the best sources of information for further information on file formats. For a broad introduction to file formats and digital preservation, see the DPC Handbook:
See also, the DPC Technology Watch Reports:
- File formats for Preservation, by Malcolm Todd 2009
- Preserving Computer-Aided Design (CAD) by Alex Ball 2013
- JPEG 2000 - a Practical Digital Preservation Standard? by Robert Buckley, Ph.D. 2008
- Preserving the Data Explosion: Using PDF by Betsy A. Fanning 2008
Understanding the broader challenges associated with file formats
A number of pieces of work have sought to develop methods of assessing the appropriateness of particular file formats for preservation, typically based on high level criteria. This includes the now somewhat dated DPC Tech Watch report. More recent thinking has begun to move away from this approach, due to the need to base decisions on practical experiences with working with file formats and software:
- Assessing file format risks: searching for Bigfoot?
- Sustainability Assessments at the British Library: Formats, Frameworks and Findings
Precision and completeness are not qualties that can always be associated with file format specifications, and this lies problem lies at the root of many preservation challenges:
- The Network is the Format: PDF and the Long-term Use of Digital Content
- "More What You’d Call ‘Guidelines’ Than Actual Rules”: Variation in the Use of Standards
Examples from the Information Security community, while not typical of the preservation challenges we are likely to experience, illustrate the flexibility in many file format specifications:
File format identification
Applying a specialist software tool to identify the formats of files to be preserved is typically one of the first steps in a digital preservation work flow. Read more about File format identification here...
Seeking reference information and guidance on specific formats
There are a number of excellent sources of information to assist digital preservationists. Wikipedia remains a good place to start for high level information about a particular file format. The associated Wikidata is the also the focus of the latest effort to build a collaborative registry of file format information.
A small number of libraries and archives have been developing their own preservation focused assessments of particular file formats. These provide useful guidance on the risks associated with common file formats, and approaches for addressing them. They are located in different places on the web, but are linked from the home of a loose collaboration between these organisations on the DPC Wiki:
The Just Solve wiki provides a community driven site for gathering information about different file formats and is particularly good for discovering information on more obscure file formats:
Child Tags
Parent Tags
Articles
Announcing the First release of the Digital Preservation Workbench
A second prototype service is now available from the Registries of Good Practice project, following on from the release of the Digital Preservation Publications Index in May. The Digital Preservation Workbench brings together a suite of different experimental interfaces and tools, aiming to help improve the practice of digital preservation and to understand what kinds of information systems we really need.
DPC Reading Club: How the concept of AI technology impacts digital archival expertise
Today’s Reading Club session was a thought provoking discussion inspired by an article from Amber Cushing and Giulia Osti in the Journal of Documentation - “So how do we balance all of these needs?”: how the concept of AI technology impacts digital archival expertise (https://doi.org/10.1108/JD-08-2022-0170). The article summarized the thoughts and expectations of a focus group of archival practitioners around Artificial Intelligence (AI) and the impact on expertise within the sector. After a...
File format recommendations - I wouldn’t say they are unacceptable, but I wouldn’t recommend them either
Last week I joined a webinar entitled “A Comparison of Recommended File Formats and the New Dutch Method for File Format Assessment”. It’s great to see the outcomes of this collaborative work, and it’s clear that it has already played an important role in bringing out some key themes in the preservation approaches of various organizations. But I felt that a number of aspects give cause for concern. The collation of file format policies has highlighted some approaches that I believe should be...
Title: Preservation Digitisation Project – Digitising the Tasmanian Archives audio visual collection
Karin Haveman is Acting Manager Government Archives and Preservation at the Tasmanian Archives and Digitisation Services Coordinator In February 2021, Libraries Tasmania launched the Preservation Digitisation Project – a major collaborative project that brings together Digitisation Services, System Support and Delivery, Government Archives, and the Community Archives teams. The aim of this project is to digitise our Tasmanian film, sound, and video collections for long-term...
Digital preservation at the National Library of Australia
Libor Coufal is Assistant Director for Digital Preservation at the National Library of Australia We are very mindful that it has been (not quite all, but mostly) quiet on the NLA communication front in the last several years, while we have busily worked on implementing our digital preservation program. Our attendance at this year’s iPres (our first since 2014) was a great opportunity to pause and reflect on the progress we have made. We would like to update the community on what we have...
Further resources and case studies
Case studies Here are some examples of how DPC RAM has been used by members of the community to help track their progress in digital preservation. If you have a example of DPC RAM in action that you would like to share please contact us: Assessing where we are with digital preservation (2021) - a blog post from Fabiana Barticioti, Digital Assets Manager at LSE Library. From 'starting digital preservation' to 'business as usual' (2021) - a blog post from Anna McNally, Senior...
Digital Preservation of Community Archives: Breaking down barriers to digital preservation through training
Dr Deborah Thorpe is Education and Outreach Manager for the Digital Repository of Ireland This autumn, the Digital Repository of Ireland (DRI) held an online introductory training programme in digital preservation for our members, with a particular focus on the training and community-building needs of community archivists. This course has been helping with breaking down barriers to digital preservation, by making topics such as appraising your digital collections for preservation,...
Understanding User Needs: Technology Watch Guidance Note on Access to digital collections available on general release
The DPC has released the next in its series of Technology Watch Guidance Notes on Access to digital collections. The new Guidance Note entitled Understanding User Needs by Sharon McMeekin is available to the digital preservation community from today. Understanding User Needs provides a pragmatic approach to conducting and interpreting a user needs analysis, whilst highlighting the importance and significance of the results.
Level up with DPC RAM
The DPC’s Rapid Assessment Model is a helpful tool for assessing an organization’s maturity with digital preservation. It allows you to consider both where you are currently and where you would like to be, and highlight gaps in your current digital preservation capacity. This resource is designed to help you work out how to address those gaps and move up the levels of RAM. For each of the 11 sections of RAM there are helpful tips, links to useful resources and case studies...